Allow nativeparse to parse source code directly by bzoracler · Pull Request #21260 · python/mypy

bzoracler · 2026-04-17T09:12:01Z

This is the mypy counterpart of mypyc/ast_serialize#54

bzoracler · 2026-04-17T09:20:31Z

Current CI failure is due to changed typing signature of ast_serialize.parse::source, this has been fixed in the corresponding PR in mypyc/ast_serialize (see changed line).

for more information, see https://pre-commit.ci

bzoracler · 2026-04-28T04:48:21Z

CI failures:

Step Compiled with_mypyc: As before, this is fixed in https://github.com/bzoracler/ast_serialize/blob/566ddc362930a821549ca5fbb0d7d0f3bd88eb6e/ast_serialize.pyi#L26
These errors should be fixed using the updated binaries built from the changes in Allow parsing source code directly mypyc/ast_serialize#54:
- E TypeError: argument 'source': 'bytes' object is not an instance of 'str'
- E ValueError: Source parsing is not supported yet for test_trivial_binary_data_from_string_source
- E ValueError: Source parsing is not supported yet for testPackageRootMultipleParallel, testParallelRunWithSyntaxError, testCheckingStubPackagesWorksInParallelMode, and job Parallel tests with .*: I believe like the code path for parallel checking causes both the source code and a file name to be passed to parsing functions? I think the tests passed before because either the source argument was not passed or the file_exists check resulted in False (and we fell back to the old parser when the file didn't exist).

Is it possible for CI to run a non-released version of ast_serialize?

Resolves #21 Tests are part of python/mypy#21260

This is mostly needed for #21260

ilevkivskyi

I have one comment for now. Also it looks like parallel type-checking is somehow broken by this.

bzoracler · 2026-05-17T23:12:56Z

~~@ilevkivskyi I don't quite know what's going on here. I checked out 1100800 (the commit before bumping ast-serialize to 0.5.0 on master), installed ast-serialize==0.4.0, and did this:~~

mypy/mypy/parse.py

Line 31 in 488a646

if options.native_parser:

- if options.native_parser: + if options.native_parser and source:

Parallel checking on my machine crashes with just this change (so none of the changes in this PR were applied). Tracebacks are the same as those in e.g. https://github.com/python/mypy/actions/runs/25999007558/job/76418562048. Do you have any suggestions?

Oops, "parallel checking" would ~~try to use the default parser~~not work at all in that case, my bad. I'll look at this in more depth.

ilevkivskyi · 2026-05-17T23:52:55Z

Hint: source is a required argument for parse(), which value do you think was (and still is) passed there for native parser, and how your change in ast_serialize will handle that?

ilevkivskyi · 2026-05-17T23:59:21Z

Btw, I added some logging, and it looks like we sometimes pass a non-empty source to parse(), which means there may be a possibility for performance optimization. Ideally we should not read a file in Python unless absolutely necessary, since it is much faster in Rust.

ilevkivskyi · 2026-05-18T00:27:08Z

Yeah, we eagerly read the file if there is only one file in the parse batch. Anyway, no need to fix it in this PR since this is a pre-existing problem, you can just fix the crash by passing an actual source (which should be None in most cases) instead of hard-coded "".

bzoracler · 2026-05-18T20:23:20Z

+                if not os.path.exists(path):
+                    build_error(
+                        "Cannot read file '{}': {}".format(
+                            path.replace(os.getcwd() + os.sep, ""),
+                            os.strerror(2),  # `errno.ENOENT`
+                        )
+                    )


This is temporary, I plan on making ast_serialize surface OSError instead in a follow-up.

bzoracler · 2026-05-18T20:35:29Z

@ilevkivskyi Appreciate the pointers. Some commentary:

There are a few places where source="" is hard-coded in mypy.build, I only limited the change in 8e53191 to make the tests pass.
This commit b029c44 was done because I didn't want mypy.parse.parse::source: str | bytes | None, as I assume this parsing function is now used out in the wild.

ilevkivskyi

This is not ready.

In general, I feel like ability to parse source should simplify things, not vice versa. A bunch of the logic was added solely for the purpose of diverting to the old parser in cases where there is no file.

Let's give this one more iteration (or I can simply do this myself).

ilevkivskyi · 2026-05-18T21:50:59Z

+                            path.replace(os.getcwd() + os.sep, ""),
+                            os.strerror(2),  # `errno.ENOENT`
+                        )
+                    )


I don't think this is a good place for this check. This is executed in a thread, instead it should be done before parsing, to match existing logic.

This reverts commit 2a523b5.

bzoracler · 2026-05-19T05:29:48Z

-        if post_parse:
-            self.post_parse_all(states)
+            # This duplicates a bit of logic from State.parse_file(). This is done to
+            # optimize handling of states parsed in parallel.


I've just copied the previous contents of def parse_parallel straight here, as I don't think State.parse_file() can be refactored very simply so that parallel parsing uses the same logic, even with removing the previous sequential states handling.

bzoracler · 2026-05-19T05:33:15Z

+            # Handle fake `__init__.py` files due to `--package-root`
+            if (
+                (source is None)
+                and (os.path.dirname(path) in self.fscache.fake_package_cache)
+                and (os.path.basename(path) == "__init__.py")
+            ):
+                source = ""


Substitutes previous handling of file_exists = self.fscache.exists(path, real_only=True) in the same method.

I think you should not need this if you follow my suggestion in the first comment above.

ilevkivskyi

OK, this is moving in the right direction. I have few more comments.

ilevkivskyi · 2026-05-19T20:26:58Z

+                        state.xpath.replace(os.getcwd() + os.sep, ""),
+                        os.strerror(2),  # `errno.ENOENT`
+                    )
+                )


Hm, this is a bit annoying. I guess it is better to keep real_only parameter, then you will be able to write here:

if not self.fscache.exists(state.xpath, real_only=True): state.source = state.get_source()

This way you will not need this, and also will be able to remove the ugly check below.

I've generally avoided trying fixes that involved mutating state.source = outside of methods of State, but if that's ok, I've applied the suggestion.

Yes, it is OK in this case as it will only affect fake/synthetic files.

ilevkivskyi · 2026-05-19T20:29:50Z

-            sequential_states, parallel_states
-        )
+            for state in parallel_parsed_states:
+                # New parser only returns serialized ASTs


You modified this comment while copying, why?

Original reason was because parallelize only those parts of the code that can be parallelized efficiently., to me, reads out of context when parse_parallel no longer handles a variable called sequential_states. But I've restored the original comment as-is.

It is not really about sequential states, it is about the general logic: in parse_file() we do (roughly): pre-parse, parse, post-parse. In parse_all() we do: pre-parse sequentially, parse in parallel, post-parse sequentially. This is done like this to avoid overhead of context switches in code that holds the GIL (pre-parse and post-parse).

ilevkivskyi · 2026-05-19T20:31:53Z

+            # Handle fake `__init__.py` files due to `--package-root`
+            if (
+                (source is None)
+                and (os.path.dirname(path) in self.fscache.fake_package_cache)
+                and (os.path.basename(path) == "__init__.py")
+            ):
+                source = ""


I think you should not need this if you follow my suggestion in the first comment above.

github-actions · 2026-05-20T20:29:25Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

ilevkivskyi

LG, thanks!

ilevkivskyi · 2026-05-21T00:40:09Z

@bzoracler If you are interested in working more in this direction, I think #21222 and #21514 are good next issues. (Btw it could make sense to fix them in the same PR, as they are somewhat related).

#21515 is another important thing, but it is waiting on input from @JukkaL

bzoracler · 2026-05-21T03:29:59Z

I'll take a look at #21222 and #21514 in the next few days.

#21515 looks quite involved so I'll pass on this for now, but I'm keen on seeing it resolved as it looks quite useful for plugins*, so if the work is green-lighted I may submit a PR in the next few weeks if I don't see any work done on it already.

*Ruff parser allows parsing string forward expressions with leading spaces out-of-the-box, Python's ast.parse() doesn't, I've so far resorted to hacks like surrounding string contents representing expressions with () just to get the node column numbers right.

bzoracler mentioned this pull request Apr 17, 2026

Allow parsing source code directly mypyc/ast_serialize#54

Merged

This comment has been minimized.

Sign in to view

bzoracler added 3 commits April 28, 2026 14:03

Enable native parsing to use source directly

2e793ae

Remove file_exists parameter from mypy.parse.parse() calls

50b0860

Test for invalid bytes

ac275e4

bzoracler force-pushed the nativeparse-source branch from c8c10dd to ac275e4 Compare April 28, 2026 02:28

This comment has been minimized.

Sign in to view

Fix omitted argument

149e459

bzoracler force-pushed the nativeparse-source branch from 444f4e9 to 149e459 Compare April 28, 2026 03:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

47e45f7

for more information, see https://pre-commit.ci

bzoracler marked this pull request as draft April 28, 2026 03:10

This comment has been minimized.

Sign in to view

bzoracler marked this pull request as ready for review April 28, 2026 04:48

hauntsaninja added the upnext label May 10, 2026

Merge branch 'master' into nativeparse-source

96e8e07

This comment has been minimized.

Sign in to view

ilevkivskyi pushed a commit to mypyc/ast_serialize that referenced this pull request May 17, 2026

Allow parsing source code directly (#54)

87b7843

Resolves #21 Tests are part of python/mypy#21260

ilevkivskyi mentioned this pull request May 17, 2026

Bump ast-serialize to 0.5.0 #21501

Merged

ilevkivskyi added a commit that referenced this pull request May 17, 2026

Bump ast-serialize to 0.5.0 (#21501)

488a646

This is mostly needed for #21260

Merge branch 'master' into nativeparse-source

beba478

ilevkivskyi reviewed May 17, 2026

View reviewed changes

Comment thread mypy/build.py

This comment has been minimized.

Sign in to view

bzoracler marked this pull request as draft May 17, 2026 19:13

Remove fscache existence checks and parallel workarounds

99c7610

This comment has been minimized.

Sign in to view

bzoracler marked this pull request as ready for review May 18, 2026 20:21

bzoracler commented May 18, 2026

View reviewed changes

bzoracler requested a review from ilevkivskyi May 18, 2026 20:35

ilevkivskyi reviewed May 18, 2026

View reviewed changes

bzoracler added 2 commits May 19, 2026 11:39

Revert "Temporarily fix test"

df34d0b

This reverts commit 2a523b5.

Revert "Refactor to allow a safer source=None"

49c65b8

bzoracler marked this pull request as draft May 18, 2026 23:57

bzoracler added 3 commits May 19, 2026 12:42

Handle source=None in parse() function

f47a898

Check for file existence before parallel parsing

8cfffa0

Handle source=None when --package-root is set

b9cc0b5

This comment has been minimized.

Sign in to view

bzoracler added 2 commits May 19, 2026 16:35

Simplify handling of --package-root

20b035e

Inline parallel parsing

52422db

This comment has been minimized.

Sign in to view

bzoracler commented May 19, 2026

View reviewed changes

bzoracler marked this pull request as ready for review May 19, 2026 05:36

bzoracler requested a review from ilevkivskyi May 19, 2026 05:45

ilevkivskyi reviewed May 19, 2026

View reviewed changes

bzoracler added 3 commits May 21, 2026 08:00

Restore real_only check and simplify existence check

d155dc6

Restore inline configuration application comment

630afc8

Restore comment changes to State.parse_file()

efa9a63

bzoracler requested a review from ilevkivskyi May 20, 2026 21:17

ilevkivskyi approved these changes May 21, 2026

View reviewed changes

ilevkivskyi merged commit 6f0e77b into python:master May 21, 2026
25 checks passed

bzoracler deleted the nativeparse-source branch May 21, 2026 03:17

Uh oh!

Conversation

bzoracler commented Apr 17, 2026

Uh oh!

bzoracler commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

bzoracler commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

bzoracler commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilevkivskyi commented May 17, 2026

Uh oh!

ilevkivskyi commented May 17, 2026

Uh oh!

ilevkivskyi commented May 18, 2026

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bzoracler commented May 18, 2026

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

bzoracler May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ilevkivskyi commented May 21, 2026

Uh oh!

bzoracler commented May 21, 2026

bzoracler commented Apr 17, 2026 •

edited

Loading

bzoracler commented Apr 28, 2026 •

edited

Loading

bzoracler commented May 17, 2026 •

edited

Loading

bzoracler May 19, 2026 •

edited

Loading